STOCHASTIC RECEDING HORIZON CONTROL WITH BOUNDED 
CONTROL INPUTS: A VECTOR SPACE APPROACH 

DEBASISH CHATTERJEE, PETER HOKAYEM, AND JOHN LYGEROS 

Abstract. We design receding horizon control strategies for stocliastic discrete- 
time linear systems with additive (possibly) unbounded disturbances, wliile obey- 
ing hard bounds on the control inputs. We pose the problem of selecting an appro- 
priate optimal controller on vector spaces of functions and show that the resulting 
optimization problem has a tractable convex solution. Under the assumption that 
the zero-input and zero-noise system is asymptotically stable, we show that the 
variance of the state is bounded when enforcing hard bounds on the control in- 
puts, for any receding horizon implementation. Throughout the article we provide 
several examples that illustrate how quantities needed in the formulation of the 
resulting optimization problems can be calculated off-line, as well as comparative 
examples that illustrate the effectiveness of our control strategies. 
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§1. Introduction 

Receding horizon control is a popular paradigm for designing control policies. 
In the context of determuristic systems it has received a considerable amount 
of attention over the last two decades, and significant advancements have been 
made in terms of its theoretical foimdations as well as industrial applications. 
The motivation comes primarily from the fact that receding horizon control yields 
tractabile control laws for deterministic systems in the presence of constraints, and 
has consequently become popular in the industry. The counterpart in the context 
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of stochastic systems, however, is a relatively recent development. In this article 
we solve the problem of stochastic receding horizon control for linear systems 
subject to additive (possibly) unbounded disturbances and hard norm bounds on 
the control inputs, over a class of feedback policies. Methods for guaranteeing 
hard bounds on the control inputs, within our context, while ensuring tractability 
of the underlying optimization problem are, to our knowledge, not available in the 
current literature. Preliminary results in this direction were reported in [HCL09]. 

In the deterministic setting, the receding horizon control scheme is dominated by 
worst-case analysis relying on robust control and robust optimization methods, see, 
for example, [Ber05, MRRSOO, BM99, LHBW07, MacOl, Bla99, FB05, YB09, RH05] 
and the references therein. The central idea is to synthesize a controller based 
on the bounds of the noise such that a certain target set becomes invariant with 
respect to the closed-loop dynamics. However, such an approach tends to yield 
rather conservative controllers and large infeasibility regions. Moreover, assigning 
an a priori bound to the noise seems to demand considerable insight. A stochastic 
model of the noise is a natural alternative approach to this problem: the con- 
servativeness of worst-case controllers may be reduced, and one may not need to 
impose any a priori bounds on the maximum magnitude of the noise. In [BB07], the 
authors reformulate the stochastic programming problem as a deterministic one 
with boimded noise and solve a robust optimization problem over a finite horizon, 
followed by estimating the performance when the noise is unbounded but takes 
high values with low probability (as in the Gaussian case). In [PS09] a slightly dif- 
ferent problem is addressed in which the noise enters in a multiplicative manner, 
and hard constraints on the states and control inputs are relaxed to constraints 
resembling the integrated chance constraints of [Han83] or risk measures in math- 
ematical finance. Similar relaxations of hard constraints to soft probabilistic ones 
have also appeared in [CKW08] for both multiplicative and additive noise inputs, 
as well as in [OJM08] for additive noise inputs. There are also other approaches, 
e.g., those employing randomized algorithms as in [BW07, Bat04, MLL05]. Related 
lines of research can be found in [vHB03, vHB06] dealing with constrained model 
predictive control (MFC) for stochastic systems motivated by industrial applica- 
tions, in [RCMA+09, BSW02, SSW06] dealing with stochastic stability, in [SB09b] 
dealing with Q-design, in, e.g., [LH07, LHC03] dealing with alternative approaches 
to control under actuator constraints and neural-network approximation. The ar- 
ticles [ACCL09, CACL09] deal with a formulation that allows probabilistic state 
constraints but not hard input constraints, and is hence complementary to the ap- 
proach in the present article, and [HCCLIO] treats the case of output feedback. . 
Finally, note that probabilistic constraints on the controllers naturally raise difficult 
questions on what actions to take when such constraints are violated, see [CCCL08] 
and [CF09] for partial solutions to these issues. 

The main contributions of the article are as follows: We give a tractable, convex, 
and globally feasible solution to the finite-horizon stochastic linear quadratic (LQ) 
problem for linear systems with possibly imbounded additive noise and hard con- 
straints on the elements of the control policy. Within this framework one has two 
directions to pursue in terms of controller design, namely, a posteriori bounding the 
standard LQG controller, or employing certainty-equivalent receding horizon con- 
troller. While the former direction explicitly incorporates some aspects of feedback, 
the S5mthesis of the latter involves control constraints and implicitly incorporates 
the notion of feedback. Our choice of feedback policies explores the middle ground 
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between these two choices: we explicitly incorporate both the control bounds and 
feedback at the design phase. More specifically, we adopt a policy that is affine in 
certain bounded functions of the past noise inputs. The optimal control problem 
is lifted onto general vector spaces of candidate control functions from which the 
controller can be selected algorithmically by solving a convex optimization prob- 
lem. Our novel approach does not require artificially relaxing the hard constraints 
on the control input to soft probabilistic ones (to ensure large feasible sets), and 
still provides a globally feasible solution to the problem. Minimal assumptions 
of the noise sequence being i.i.d and having finite second moment are imposed. 
The effect of the noise appears in the convex optimization problem as certain fixed 
cross-covariance matrices, which may be computed off-line and stored. 

Once tractability of the optimization problem is ensured, we employ the result- 
ing control policy in a receding horizon scheme. Under our policies the closed- 
loop system is in general not necessarily Markovian, and as a result stability of 
the closed-loop system is not immediate. In fact, we can no longer appeal directly 
to standard Foster-Lyapimov methods. We establish that our receding horizon 
control scheme provides stability under the assumption that the zero-input and 
zero-noise system is asymptotically stable. We provide examples that demonstrate 
the effectiveness of our policies with respect to standard methods such as certainty- 
equivalent MFC, standard unconstrained LQG and saturated LQG control. These 
examples show that our policies perform no worse than the standard unconstrained 
LQG controller in the absence of control constraints, and outperform the certainty- 
equivalent MFC as well as the saturated LQG control by a significant margin. 

Our mechanism for selection of a policy consists of two steps: The first concerns 
the structure of our policies, and is motivated by preceding work in robust opti- 
mization and MFC [Lof03, BTGGN04, GKM06]. The second concerns the procedure 
for selection of an optimal policy from a general vector space of candidate con- 
trol functions, and is inspired by approximate djmamic programming techniques 
[BT96, LR06, SS85, dFR03, Fow07]. With respect to the first step, our policies are 
more general compared to those in [L6f03, BTGGN04, GKM06]. With respect to 
the second, the selection procedure of our policies consists of a one-step tractable 
static optimization program. 

The rest of this article is organized as follows. In Section 2 we state the main 
problem to be solved in the most general form. In Section 3 we provide a tractable 
solution to the finite horizon optimization problem on general vector spaces. This 
result is specialized to various classes of noise and input constraint sets in Section 4. 
Stability of receding horizon implementations of the obtained closed-loop policy is 
shown in Section 5, and input-to-state stability properties are discussed in Section 
5.2. We provide a host of numerical examples that illustrate the effectiveness of our 
approach in Section 6. Finally, we conclude in Section 7 with a discussion on future 
research directions. 

Notation. Hereafter, N := {1,2, . . .} is the set of natural numbers. No N U {0}, 
Z is the set of all the integers, ]R>o is the set of nonnegative real numbers, and C 
denotes the set of complex numbers. We let denote the indicator function of 
a set A, and I„x)i and 0„xm denote the n-dimensional identity matrix and n x m- 
dunensional zeros matrix, respectively. Let 1 1 ■ 1 1 denote the standard Euclidean norm, 
and IHIp denote the usual {p norms. Also, let Ej-J-] denote the expected value given 
xq, and tr(-) denote the trace of a matrix. If Mi and M2 are two matrices with the 
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same number of rows, we employ the standard notation [Mj | M2] for the matrix 
obtained by stacking the columns of Mj followed by the columns of M2 . For a given 
symmetric n-dimensional matrix M with real entries, let { A,(M) | f = 1, . . . , n} be the 
set of eigenvalues of M, and let Amax(M) := max, A,(M) and Amm(M) := min, A,(M). 
Finally, for a random vector X let Ex denote the matrix E|^XX^j and fix denote the 
vector e[x]. 



§2. Problem Statement 
Consider the following discrete-time stochastic d5mamical system: 
(2.1) xt+i = Axt + But + wt, t e No, 

where Xt e R" is the state, Ut is the control input taking values in some given control 
set U c R™ to be defined later, A e R"^", S e R"^"', and {wt)tef~So is a sequence of 
stochastic noise vectors with iff e W c R" . We assume that the initial condition xq 
is given and that, at any time t, Xf is observed perfectly. We do not assume that the 
components of the noise Wt are uncorrelated, nor that they have zero mean; this 
effectively means that Wt may be of the form Fw'^ + b for some noise Wf e W whose 
components are uncorrelated or mutually independent, F e R"^P, and b e M". 
Without loss of generality we shall stick to the simpler notation of (2.1) throughout 
this article. The results readily extend to the general case of Wf = Fiu'^ + b, as can be 
seen in [HCL09]. 

Generally, a control policy tt is a sequence (ttq, tti, 712, . . .) of Borel measurable maps 
TTf : R" X ■ ■ ■ X R" ^ U, t e No. Policies of finite length such as (nt, nt+\, nt+N-i) 

k(t)- times 

will be denoted in the sequel by TTf:f+jv-i. 

Fix an optimization horizon N e N and let us consider the following objective 
function at time t given the state Xt: 



(2.2) 



V, ■- E 



N-l 



'Y' ,{^J+kQkXt+k + uJ_^_i^RkUt+k'j + xJ_^_^Q]^Xt+N 



Xt 



where Qt > 0,Rt > 0, Qn > are some given symmetric matrices of appropriate 
dimension. At each time instant f, we are interested in minimizing (2.2) over the 
class of causal state feedback strategies n defined as: 



(2.3) 



Wf 
Wf+l 

Mf+N-1 



7If(Xf) 

'^t+l{Xt,Xt+l) 

7Tf+A/_i(Xf,Xf+i, • ■ ■ ,Xf+jv-l) 



for some measurable functions Ut-.t+N-i '■- {^if, ■■■ ,7Tt+iv-i} G H, while satisfying 
Mf e U for each t. The receding horizon control procedure for a given control horizon 
Nc e (1, . . . ,N) and time t can be described as follows: 

(a) measure the state Xt; 

(b) determine an admissible optimal feedback control policy, say tt* e Tl, 
that minimizes the N-stage cost function (2.2) starting from time t, given the 
measured initial condition Xt; 

(c) apply the first Nc elements hJ.j^j^ _^ of the policy n*.^_^J^_^•, 

(d) increase ttot + Nc, and go back to step (a). 
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In this context, if Nc = 1 then this is usual MFC, and if Nc = N, then it is usually 
known as rolling horizon control. 

Since both the system (2.1) and cost (2.2) are time-invariant, it is enough to 
consider the problem of minimizing the cost for t = 0. In view of the above we 
consider the problem: 

(2 4) min \ Vq d5mamics (2.1), and Mf e CJ for each t\. 

If feasible, the problem (2.4) generates an optimal sequence of feedback control 
laws n* = {n*,--- 

The evolution of the system (2.1) over a single optimization horizon N can be 
described in a compact form as follows: 



(2.5) 
where 



X - Axq + Bu + Dw, 



X := 



Xo 




Uq 




Wo 






X\ 




Ml 




Wi 


A- 


A 




u := 




, w := 










MN-1 











B := 



s 

AS 



AS S 



D := 



0)ixn 
Inxi; 

A 

AN-l 



OlIXIl 



OnXH 

IflXH 



Using the compact notation above, the optimization Problem (2.4) can be rewritten 
as follows: 

(2.6) min Ie.vJx^Qx + u^Ru \ | d5mamics (2.5), u e TUl, 

where Q = diaglQo, . . . , Qn), R = diag{Ro, ■ • .,Rn-i], and U tJ x . . . x U. 

N-times 

§3. Main Result 

We require that our controller is selected from a vector space of candidate con- 
trollers spanned by a given set of "simple" basis functions. The precise algorithmic 
selection procedure is based on the solution to an optimization problem. The basis 
functions may represent particular types of control functions that are easy or inex- 
pensive to implement, e.g., minimum attention control [Bro97], or may be the only 
ones available for a specific application. For instance, piecewise constant policy 
elements with finitely many elements in their range may be viewed as controllers 
that can provide only finitely many values; this may be viewed as an extended 
version of a bang-bang controller, or as a hybrid controller with a finite control 
alphabet. 

More formally, let "K be a nonempty separable vector space of functions with 
the control set U as their range, i.e., "H is the linear span of measurable fimctions 
e^' : W ^ U, where v e J - an ordered countable index set (see [Lue69] for more 
details). As mentioned above, the elements of 'H may be linear combinations of 
typical "simple" controller functions for t - 0,1, . . . ,N - 1. We are interested in 
policies of the form Ut = rjt + 2^[lo where rjt is an m-dimensional vector 
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and each component of the 7«-dimensional vector-valued function ij^tj is a member 
of 'H. Although this feedback function is directly from the noise, since the state 
is assumed to be perfectly measured, from the system dynamics (2.1) it follows 
at once that this controller Ut is actually a feedback from all the states xq, . . . ,Xf. 
Indeed, in the spirit of [Lof03, BTGGN04, GKM06, SB09a] we have 

Mo = rjo, 

Ml = 7]i + i/'i,o(xi - Axo - Brio), 

1/2 = 7]2 + 4>2,o{^l - AXo - B7]o) + !/'2,l(^2 " Axi - B{r]i + ipi,o{xi - Axo - Sr]o))), 



In other words, by construction, Mj is generally a nonlinear feedback controller 
depending on the past t states.^ Also by construction, it is causal. 
Our general control policy can now be expressed as the vector 



(3.1) M = J] + ^(if) :■= 

















+ 








(PN-i{Wo,ZVi,...,Wn.2)_ 



where, 

• (po = 0, 

• Wt for t = 0, . . . , N - 1 is the f-th random noise vector, 

• r]( is an m-dimensional vector for t = 0, . . . ,N - 1, 

• (pt{zvQ, Wt-i ) - I^llg ;) for f = 1, . . . , JV - 1 is an m-dimensional vector, and 

• each function (ptj belongs to the linear span of the basis elements (e^)vej, and thus 
has a representation as a linear combination ^f^,(-) - LveJ ^5 ,^^(')/ ^ = 

i - 0, . . . ,t - 1, where 0J' . are matrices of coefficients of appropriate dimension. 

Analogous to Fourier coefficients in harmonic analysis, we call the 0];'^ the v-th 
Fourier coefficient of the function qjtj- Therefore, whenever \I\ < oo for every 
t = 1, . . . ,N - 1, we have the finite representation 
(3.2) 

e(roo) 



(pt{wo,...,Wt-i) = [6t,o 9t,i ■■■ 9, 



f,t-i 











J^mxiiUKN-l) 



e(^fN-2) 



]R,.|J|(N-1)X1 



where 0f , e IR'"''"!-^!, e M' 



mxn\I\ 



Pel 



(Wi) 



el^l(a;,) 



Vf = 0,1,--- ,N-2. 



Note that the controller input at time f is non-Markovian as it is a function of the state vectors at 
all the previous times and not just on Xt-i. 
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In this notation the policy (3.1) can be written as 
















01,0 








(3.3) M = 7] + (p{w) = ri + 


02,0 


02,1 







9n-\,o 


0N-1,1 ■ ■ 


Sn-\,n-2_ 



e(jfN-2) 



=: J] + 0e(zi;), 



where is now the matrix of Fourier coefficients having dimension Nm x (n{N — 

1)1 J|). This Fourier coefficient matrix and the vector rj play the role of the opti- 
mization parameters in our search for an optimal policy. Note that t{w) does not 
include the noise vector iv^-i, and that is strictly lower block triangular to enforce 
causality. In what follows, as a matter of notation, by 0f we shall denote the formal 
t-th block-row of the matrix in (3.3), i.e., 0f := [0f,o • ■ ■ dt,t-i ■ ■ • oj, 
for t = 0, ■ ■ ■ ,N - 1, with 0o being the identically row. We make the following 
assumption: 

Assumption 3.1. The sequence (wOteNo of noise vectors is i.i.d with L = E^iffW^j.O 

So far we have not stipulated any boundedness properties on the elements of the 
vector space Ti. This means that the control policy elements may be unbounded 
maps. First we stipulate the following structure on the control sets: 

For a given p e [1, oo], the control input vector Ut is bounded in p-norm at each 
instant of time t, i.e., for p e [1, oo] let LfmL > be given, with 



(3.4) 



Ut e Up := [£, e R" 
Up ■- Up X . . 



< fJmL} Vie No, and 



t iisiip 
xU, 



N-times 



One could easily include more general constraint sets Up, for instance, to capture 
bounds on the rate of change of inputs. 
Our basic result is the next Theorem. 



Theorem 3.2. Consider the system (2.1). Suppose that Assumption 3.1 holds, 
is finite-dimensional < oo), and every component of the basis functions t^' is 
bounded byG > in absolute value. Then the problem (2.6) under the policy (3.1) 
and control sets (3.4) forp e [1, oo] is convex with respect to the decision variables 
(r], 0) defined in (3.3). Forp = 1,2^ and oo it admits convex tractable versions with 
tighter domains of {i], &), given by 

miiiimize tr(0"^(B"^QB + r)@L,) + 2 tr(0"^B"^QDE;,) + ?/(b"^QB + Rji] 

+ 2(xlA^QBr] + rj^B^QD^/.^ + xjA^QBO^ie) 
-I- 27/(k + B"^QB)0^e + c 
^•^■^^ subject to @ strictly lower block triangular as in (3.3), 



1 : 



2 : 



£f||0f|li^!j(;iL 



Vf = 0,1,.. 



,N-1, 
.,N-1, 
0,1,...,N-1, 
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where 

Ee ]E[e(a;)e(a;)"^], E', := E[a;e(a;)"^], 

■- E[wl ^i, ■- E[i{w)], c ■- xlA^QAxo + IxlA^QD^i + ^D^QDE^,). 

Proof of Theorem 3.2. It is easy to see that x^Qx + m^Rm is convex nondecreasing, and 
both X and u are affine functions of the design parameters (j], 0) for any realization 
of the noise w. Hence, Vq is convex in (?], 0) since taking expectations of a convex 
function retains convexity [BV04, Section 3.2]. Moreover, the control constraint sets 
in (3.4) are convex in (j], 0). This settles the first claim. 
The objective function (2.2) is given by 

E;,„[(Axo + Bu + DwfQ(Axo +Bu + Dw)] + E.^„[m"^Rm] 

= E;f„[(Axo + B{r] + @t{zv)) + DwfQ(Axo + B(/] + 0e(a;)) + Dzvj] 

+ E,v„[(t7 + &t{w)fR{ri + @i{w))] 
= xlA^QAxo + IxIA^QBt] + i]'^(b'^QB + r)t] 

+ 2(Axo + B7])^Q(B0E,Je(zi;)] + DE,,[w]) 

+ E.v„[(B0e(i(;) + Dwf Q{B&i{w) + Dw'j\ + E,,„[(0e(i(;))"^R0e(w;)] 
= xIa'^QAxo + Ix^A'^QBrj + /]"^(b'^QB + r)/] + 2ri^ R&E[t{w)] 
+ 2{Axo + BrfQ{DE,„[zv] + B0E[e(zi;)]) + tr(D'' QDE,^[w'']) 
+ tr(0"^(B"^QB + R)0E.^„[e(a;)e(a;)"^]) + 2tr(0"^B"^QDE,,„[z<;e(a;)"^]). 

Incorporating the definitions Ze, fie, and c, the right-hand side above equals 

tr(0"^(B"^QB + r)0Ec) + 2 tr(0"^B"^QDE;,) + 7]"^(b"^QB + k)?] 

+ 2(xXQB?? + rfB'^QD^i,, + x^A'^QBe^,) + 2rf(R + B'^QB)e^i, 
+ (xjA^QAxo + 2xJaTQD^„, + tr(DTQDE,„)) 

= tr(0"^(B"^QB + k)0Ec) + 2 tr(0"^B"^QDE;,) + 7/(b"^QB + k)?] 
+ 2(xXQB'? + ']^BTqd^,, + xjATQB0^ie) + 2Tf{R + B'^QB)&^i, + c. 

Since the matrix Ec is positive semidefinite, it can be expressed as a finite nonneg- 
ative linear combination of matrices of the type aa^ , for vectors a of appropriate 
dimension [BSM03, Theorem 1.10]. Accordingly, if Ec = Ef=i Oio], then 

k 

ir{&^{B^QB + R)0Ee) = ^ tr(0"r(B"^QB + R)@a,a]) 

!=1 

k 

= YXa]@'(B''QB+R)@a). 

!=1 



STOCHASTIC RECEDING HORIZON CONTROL WITH BOUNDED CONTROL INPUTS 



9 



(3.6) 



Defining 0, :- 0a, and adjoining these equalities to the constraints of the opti- 
mization program (3.5), we arrive at the optimization program 

k 

minimize Y &J(b^QB + r)©, + 2 tr(0"^B"^QDZ;,) + if(B^QB + r)i] 
(e,ei,...,0j 

+ 2(xlA'QBi] + rfB'^QD^ + xJA^QB©^') 
+ 2if(R + B'^QB)@ii' + c 
subject to strictly lower block triangular as in (3.3), 
&i = &a, forallf = 

We see immediately that (3.6) is a convex program in the parameters i], and 0„ 
and is equivalent to the cost in (3.5). 

It only remains to consider the last constraint in (3.5). First we consider the cases 
of p = 1, oo. Using the notation above, an application of the triangle inequality 
immediately shows that the constraints can be written as 



(3.7) 



= 1: ||/]4+£f||0f||i < U; 



(1) 



(CX3) 



Vf = 0,1,...,N-1, 
Vf = 0,1,. ..,N- 1. 



It follows that the objective function in (3.6) is quadratic and the constraints in (3.6)- 
(3.7) are affine in the optimization parameters ?], 0, and 0. As such, for p = 1, oo 
our problem is a quadratic program. 

For the case of p = 2, note that ift + 0fe(a;) = ^rjt 0fj 



, and by definition 



of £ it is clear that 



translates to 



1 



^rjt 0fj Vl + Gt . This immediately 



Vl + St < !Jmax/ which is the third constraint in Problem 
3.5 and it is a quadratic constraint in the optimization parameters (rj, 0). Therefore, 
for p = 2 our problem is a quadratically constrained quadratic program. □ 



The optimization problem (3.5) simplifies if we assume that fi' = E[e(5X')] = 0. 
Note that E[e(a;)] = if and only if E[eJ'j.(a;f ,)] = for all vel.At an intuitive level 
this translates to the condition that the functions & 1-1 should be "centered" 
with respect to the random variables zvt i. In particular, this simply means that for 
noise distributions that are symmetric about 0, the functions e^' should be centered 
at and be antisymmetric. For example, if the noise is Gaussian with mean and 
diagonal covariance matrix (uncorrelated components), each component of the 
functions e^' should be an odd function. 

The matrices Zc, E'^, the vector v, and the number c in Theorem 3.2 are all con- 
stants independent of xq, and can be computed off-line. As such, even if closed-form 
expressions for the entries of the matrices do not exist, they can be numerically 
computed to desired precision. The optimization problem (3.5) is a quadratic pro- 
gram [BV04, p. 152] for p - 1, oo, and a quadratically constrained quadratic pro- 
gram [BV04, p. 152] for p = 2, in the optunization parameters jrj, 0, |0„ f = 1, . . . , /c||, 
and can be easily coded in standard software packages such as cvx [GBOO] or 
YALMIP [Lof04]. Note that the optimization problem (3.5) is always feasible (simply 
set = and r] = to see this). This is not a surprise, since there are no constraints 
on the state, and by construction e U. Finally, note that the third constraint in 
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Problem (3.5) for various values of p, is a result of robustly satisfying the constraints 
posed by the various control sets (3.4) for any realization of the noise w. 

In general, the total number of decision variables in the optimization program 
(3.5) ismN{l + ln{N-l)\I\). The number of decision variables can be substantially 
reduced, e.g., by choosing 'H to be 1 -dimensional, or by fixing certain (block) 
elements of the Fourier coefficient matrix to 0. 



§4. Various Cases of Constrained Controls 

We examine in this section several special cases of Theorem 3.2 under various 
restrictions on the classes of noise and control inputs. 



§4.1. Bounded controls, unbounded noise, and p = co. Let the noise take values 
in ]R" . We provide tractable convex programs to design a policy that by construction 
respects the control constraint sets (3.4), with p = co. Starting from (3.1) let 



(4.1) 



u = rj + @(p{zv), 



where 



(p{w) := 



(Po 



• (pQ = 0,(pt{'Wo,..-rW,.i) = L]=o^'t<Pt,i{wj) for t = l,...,N-2, and 

• (pt,j{zvj) = ^(p{zvj_i), ... , (p{Wj^n)\ for some function cp such that sup (p{s) 
oo, and (pt,j : W Uoo. 



seR 



In other words, we saturate the measurements that we obtain from the noise input 
vector before inserting them into our control vector. This way we allow that the 
noise distribution is supported over the entire R", which is an advantage over 
other approaches [BB07, GKM06]. Moreover, the choice of the component satura- 
tion function (p is left open as long as the noise sequence satisfies Assumption 3.1. 
For example, we can accommodate standard saturation, piecewise linear, and sig- 
moidal functions to name a few. 

Our choice of saturating the measurement from the noise vectors, as we shall 
see below, renders the resulting optimization problem tractable as opposed to 
calculating the entire control input vector u and then saturating it a posteriori; one 
can see that the latter approach tends to lead to an intractable optimization problem. 
Note also that the choice of control inputs in (4.1) yields a possibly non-Mar kovian 
feedback. 



Corollary 4.1. Consider the system (2.1). Suppose that Assumption 3.1 holds, and 
]E[e(a;)] = with i{w) = cp{w), where cp is defined in (4.1). Then for p = oo the 
problem (2.6) under the control policy (4.1) is a convex optimization program with 
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respect to the decision variables (r], &), given by 



minirnize tr{&^{R + B^QB^&Ti^ + 2tr{DQB&T2) 



(>j,e) 




and strictly lower block triangular as in (3.3), 
where rit.i snd &tj are the i-th rows of rjt and &t, respectively. 



c = x^AQAxq + tr(DTQDE^), 

b = IB'^QAxo, 

Ti = diag{E[(po(w^o)'Po(j^o)^]/ ■ ■ ■ ,'E[(Pn-i{wn-i)<Pn-i{wn-iV]}, 
T2 = diag{E[^o(w^o)w^J], • ■ ■ /^[(pN-iC^yN-Ow^N-i]}- 



The resulting policy is guaranteed to satisfy the control constraint set (3.4) for 



A complete proof may be found in [HCL09]; it proceeds along the lines of the 
proof of Theorem 3.2. Note that the program (4.2) exactly solves (2.6) under the 
policy (4.1) and is neither a restriction nor a relaxation. 

Problem (4.2) is a quadratic program in the optimization parameters {!],&) (see 
the discussion following Theorem 3.2). The matrices Fi and T2 capture the statistics 
of the noise in the presence of the functions cp and can be computed numerically off- 
line using Monte Carlo techniques [RC04, Section 3.2]. This method will be utilized 
in the examples in Section 6. However, in some instances it is actually possible to 
compute these matrices in closed form; this is shown in the next three examples. 

Example 4.2. Let us consider (2.1) when the noise process {wt)t€Ka is an i.i.d se- 
quence of Gaussian random vectors of mean and covariance E and standard 
sigmoidal policy functions cp, i.e., <p(f) '.= t/ Vl + . Assume further that the com- 
ponents of Wf are mutually independent, which implies that E is a diagonal matrix 
diaglffj, . . . , a^}. Then from the identities in Fact 1 in §A.l, we have for i = 1, . . . ,n 



p — 00. 



and] = 0,...,N-1, 



This shows that the matrix Fi in Corollary 4.1 is diag{E', . . . , E'}, where 





ylnOiJ-oa ^jlnoiJ-oa Vl + 



e 2"' dt 
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where U is the confluent hypergeometric function (defined in the Appendix), the 
matrix T2 in Corollary 4.1 is diag{E", . . . , E"}, where 

Therefore, given the system (2.1), the control policy (4.6), and the description of 
the noise input as above, the matrices Fi and r2 derived above complete the set of 
hypotheses of Corollary 4.1. The problem (2.4) can now be solved as the quadratic 
program (4.2). A 

Example 4.3. Consider the setting of Example 4.2 (with (p a standard sigmoid) under 
the assumption that E is a not necessarily diagonal matrix. To wit, the components 
of Wt may be correlated at each time t e No; however, the random vector sequence 
(Wf )feN„ is assumed to be i.i.d. This is equivalent to the knowledge of the correlations 
between the random variables IwtM = 1, . . . ,n , which are constant over t. Then 



]E[(p{w)q){w) ] is a block diagonal matrix. Indeed, we have with E, y := 



p2. 



V2"detE„ JJk. ^(i + t2)(i + ,2)' n 2 L J w [t^l 



and 



Note that the computations of the integrals above can be carried out off-line. We 
define the matrices Ef and E[ with the {i,j)-th entry of Ej being E|^^(z:i;t,,)^(zi;tj)j 
and the (f, y)-th entry of E^ being E|^(j9(r(;f ,)a;tyj, and it follows that the matrices 
Ti = diagjEo, . . . , and T2 = diagjE^, . . . , E^ ^ 



Example 4.4. Consider the system (2.1) as in Example 4.2, and with (p the standard 
saturation function defined as (p{t) = sgn{t) min{|t|, 1}. From Corollary 4.1 we have 
for i - 1,. . . ,n and / = 0, . . . , N - 1, using the identities in Fact 1 in §A.l, 



1 

^ ^ Sin Oi J-00 



In Oi Jo 



f^'-^dt + 



2 r 

yllna, Ji 



= V2^af erff— ^) - 2a?e + 1 + erff— 



K (say). 



and 



y2n Oi J-00 

yln Oi Jo y2n a, Ji 



te '"'"dt 
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= V2^a3erf(-^) 



2a?€ 



Oi Gamma(20^, 1) 



=■■ (say). 



Therefore, in this case the matrix Fi in Corollary 4.1 is diag{E', . . . , E'} with L' := 
diagIC;, . . . , e„], and the matrix is diag(E", . . . , E"} with E" := diag(<S;', . . . , £,',;]. 
These information complete the set of hypotheses of Corollary 4.1, and the prob- 
lem (2.4) can now be solved as a quadratic program (4.2). A 

§4.2. Bounded controls, bounded noise, and p = 2. In this subsection we spe- 
cialize to the case of the noise being drawn from a compact subset of W, and the 
control inputs set 1LJ2. We make the following assumption: 

Assumption 4.5. The noise takes values in a compact set W c ]R". o 

Under Assumption 4.5 Hilbert space techniques may be effectively employed 
in our basic controller synthesis framework of Section 3 in the following way. Let 
{'H, {■, be a separable Hilbert space of measurable maps e : W ^ ILJ2 supported 
on the compact set W. The inner product is defined as {(p\, (p2)'/-( '■- L/Li (<Pi,i/ 'Pij) 
where (■, ■) is the standard inner product on real-valued functions on W. Fix a 
complete orthonormal basis (e^)vej £ f^- Since is separable, the set I is at most 
countable. Just as in (3.1) we let our candidate control policies be of the form 



(4.3) u = 



^0 

r]N-i 
















01,0 








+ 


02,0 


02,1 







_0N-1,O 


0N-1,1 ■ ■ 


0N-l,N-2_ 



e(ifi) 
e(w^N-2) 



=: 7] + &t{w), 



where the vector e(-) is the formal vector formed by concatenating the (ordered) 
basis elements (e^)i,ej, the various 0-s are formal matrices as in Section 3, and ?]f is 
an wi-dimensional vector for t = 0, . . . ,N - 1. This takes us back to the setting of 
Section 3. 

The following Corollary illustrates the technique explained above; its proof will 
only be sketched — it is similar to the proof of Theorem 3.2. Note that for finite- 
dimensional Hilbert spaces, depending on the choice of the orthonormal basis, the 
matrix may have complex or real entries. 

Corollary 4.6. Consider the system (2.1). Suppose that Assumptions 3.1 and 4.5 
hold. Thenforp = 2 and corresponding control set\J2 problem (2.6) under the pol- 
icy {A.3) admits convex tractable reformulation with tighter domains of the decision 
variables {t], 0) defined in (4.3), and is equivalent to the following program: 



(4.4) 



the minimization problem (3.5) 

subject to \\i]t\\ + VN- 1 ||0t|| < li£L/ fort ^0,...,N -1, 
and strictly lower block triangular as in (3.3). 



Moreover, ifH is a finite-dimensional subspace ofH spanned by (e^%ej for some 
finite Q I, then the problem (4.4) admits a reformulation as a quadratically 
constrained quadratic program with respect to the new decision variables ('],©) 
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corresponding to 'H, given by 

the minimization problem (3.5) 

(4.5) subject to [ly @t]\\<: !imL/ Vn for t = 0, . . . ,N - 1, 

and & strictly lower block triangular as in (3.3), 
where the vector e(-) is the vector formed by concatenating the (ordered) basis el- 
ements (e^)vej/ e(Ji') ■= [H'^oV, • ■ • / e(^fN-2)^] / £c := E[e(w;)e(iy)^], E', := E^wt{w)^^. 
In both the above cases the resulting policies are guaranteed to satisfy the control 
constraint set (3.4) for p = 2. 

Proof. (Sketch.) Evaluating the objective function in (2.6) gives the objective func- 
tion in (3.5). Recall that 0f is the f-th block row of the formal matrix 0, and 0f , is the 
ith sub-row of the block row 0f, where t - 0, . . . ,N - 1 and i - 1,. . . ,n. Applying 
the triangle inequality for any t = 0, . . . ,N - 1, we get 



\\rjt + &tt{zv)\\ ^ \\i]t\\ + ||0fe(ii')|| = ||!?f|| + yj{@tt{w),ett{w)). 



= h\\ + \Tj (®tA^)'®tA^)) = h\\ + ^^{N-l)J^ \\®t£ 

} 1=1 ) !=1 

= ||;](|| + VN-1 ||0t|| 

by orthogonality of the basis elements (e^ ),,£/. The right-hand side of the last 
equality appears as the constraint in (4.4). 

For the finite-dimensional case (4.5), we note that the objective function is iden- 



7]f + 0,e(a;) = 



in (4.4), and the constraint 




1 




1 1 


[r]t &t] 


e(iy) 


, and 







This leads to a quadratically constrained quadratic program in the finite- dimen- 
sional decision variables (rj, 0^ □ 

Let us illustrate the usage of Corollary 4.6 through the following example. 

Example 4.7. Consider the system (2.1), and suppose that the n components of the 
noise vector zvt are independent uniform random variables taking values in [-a, a] 
for some a > 1. Therefore, W - [-a, a]". It is a standard fact in Fourier analysis 
that the system |e^"i''(*/(2a)) | v e z| is an orthonormal basis for the Hilbert space of 
square-integrable functions on [-a, a] equipped with the standard inner product 
(f'S) -^£/(f)^(f)df.Welet 

' sin{nvti/a) sin{nvt„/2)''~^ 



ti e [-a,a],i = l,...,n,v = 1,...,M>. 



-Ki^span 

Let t^'{ti, ... ,tn) ■= ^^^[sin(7ivfi/fl), . . . ,sin(7ivf„/fl)j , f,- e [-«,«]. It is clear that the 
M" -valued functions |e^', v = 1, . . . ,m| form an orthonormal set. Indeed, 

" 2 " 1 r" 

{^vi' ev2)'K ~ Xj ^"^''i'" ^"^''^ ~ ~ Xj 2~ I sm{nv2ti/a)dti 
!=i " !=i ^ 

= - X, 4 J (cOs((Vi - V2)7TS,) - COs((Vi + V2)7TS;))ds,- 
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1 otherwise. 

We define Ut := rjt + ett{iv) = /]f + ijlj 0ge(zi;y) = ly + L',~=lL^=iSlj^''i^'^j) for 
appropriate matrices 6"^^ j. Now finding policies of the form (4.3) that minimize the 
objective function in (2.6) becomes straightforward in the setting of Corollary 4.6. 
The matrices Ec and in Corollary 4.6 are now easy to derive from Euler's identity 
e^^ = cos + i sin 6, and the fact that the characteristic function of a uniform random 
variable C supported on [-a, a] is given by E|^e^™^"'j = ^ j^^e^^"'* dt = sinc(27TCfl) 
for some v e H, where the function sine is defined as sinc(4) :- sm{^) /£, if £, i= 
and 1 otherwise. 

An alternative representation of the various matrices may be obtained by looking 
at each component of the policy elements separately. In this approach we define 

t{wt) := [t{wt,i)^ ■■■ e(a;f,„)T]^, t{w) := [t{wo)^ ■■■ e(a;iv-2)^]^ • 

In the above notation rjt^i + L;=o ^j.iK'^jj) is of course the f-th entry of the input Wf 
at time f, where f = 0, . . . , N - 1 and i = 1, . . . ,n. A 

§ 4.3. Constraints on control energy. Some applications require constraints on 
the total control energy expended over a finite horizon. In the framework that 
we have established so far, such constraints are easy to incorporate. Indeed, if we 
require that u^Su ^ jS^ for some preassigned jS > and positive definite matrix S, 
then in the setting of Theorem 3.2 this can be ensured by adjoining the condition 
||7]||g + \\&\\s l|S|L£ ^ jS to the constraints, where H^H^ '■- sJrfMr] is the standard 
weighted 2-norm for a positive definite matrix M. 

Comparison with affine policies. As pointed out earlier affine feedback policies 
from the noise have been previously treated in [Lof03, BTGGN04, GKM06, GK08], 
where the following feedback policy was considered: 

t-i 

(4.6) "t = Xj 

/=o 

In the deterministic setting it was shown in [GKM06] that there exists a one-to- 
one (nonlinear) mapping between control policies in the form (4.6) and the class 
of affine state feedback policies. That is, provided one is interested in affine state 
feedback policies, the parametrization (4.6) constitutes no loss of generality. In 
fact, we shall illustrate in the examples, in the unconstrained inputs case, that the 
performance of this strategy with t{wj) in place of zvi is almost as good as the 
standard LQG controller if not equally good. However, in the constrained inputs 
case this choice is suboptimal in the class of measurable control policies, but it 
ensures tractability of a large class of optimal control problems. It can be seen that 
the solution to the optimization problem (2.4) is tractable with this parametriza- 
tion [GKM06]. However, if the elements of the noise vector w are unbounded, the 
control input (4.6) does not have an upper bound. For the case of bounded inputs, 
the control policy (4.6) under unbounded noise will in general not satisfy the con- 
trol constraint sets (3.4). This unboundedness is a potential problem in practical 
applications, and has been usually circumvented by assuming that the noise input 
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lies within a compact set [BB07, GKM06] and designing a worst-case min-max type 
controller under this assumption. 

It is important to point out that our result in Section 4.2 differs from that in 
[GKM06] in two aspects. First, we are solving the problem on finite-dimensional 
Hilbert spaces with general basis functions as opposed to a finite collection of 
affine functions in [GKM06]. Second, the feasibility of our problem is maintained 
for any bound on the elements of W, as our constraint in (4.5) could still produce 
a feedback gain matrix & that has norm substantially different that 0, whereas if 
there are elements in W with large enough norm and we take the control input to 
heu = rj + &ZV, the constraints produce always a solution with norm very close 
to 0, hence practically only the open-loop term remains in the case of [GKM06]. 

§5. Stability Analysis 

The main result in Theorem 3.2 asserts that the finite horizon optimization 
problem (2.6) is convex and tractable using the policy (3.1). To apply this result in a 
receding horizon fashion, it is imperative to further study some qualitative stability 
properties of the proposed policy. Under this policy the closed-loop system is not 
necessarily Markovian, and as such, standard Foster-Lyapunov methods cannot 
be directly applied. In what follows, we treat the stability problem for p - co and 
Lfmax •= LfJ^ilx- However, this is without any loss of generality, for the same results 
hold (with minor modifications in the proofs) for p = 1, 2 as well. We impose the 
following assumption: 

Assumption 5.1. The matrix A in (2.1) is Schur stable, i.e., the absolute value of 
the eigenvalues of A are all strictly less than 1. <> 

At a first glance this assumption on A might seem restrictive. Indeed, in the 
deterministic setting we know [YSS97] that for discrete-time controlled systems it 
is possible to achieve global asymptotic stability with bounded control inputs if 
and only if the pair (A, B) is stabilizable with arbitrary controls, and the spectral 
radius of A is at most 1. However, the problem of ensuring bounded variance of 
linear stochastic systems with bounded control inputs is to our knowledge still 
largely open; see, however, the recent manuscript [RCMA"'"09] for partial results as 
well as in [BSW02, SSW06]. 

§5.1. Mean-square boundedness. We shall show that the variance of the state is 
uniformly bounded under receding horizon application of the strategy (3.1), for 
any control horizon Nc < N. The receding horizon implementation is iterative in 
nature: the optimization problem is solved every kNc steps, where k e Nq. The 
resulting optimal control policy (applied over a horizon Nc) is given by 

where the control gains depend explicitly on the initial condition x/f^ . For £ = 
1/ ■ ■ ■ ,Nc, the resulting closed-loop system over horizon Nc is given by: 

(5.1) Xkfj^+f = A'^XkN, + Bfnlj^^.j.^^^f_.^{xkN,) + DcWkN,:kN,+t-\, k e No, 



'"^kN,:(k+l)Nc-l^^l'Nc) 



T^lj^i^kNc) 
"k+l(^™.) 

_"ft+l)N.-l(^™t)_ 



STOCHASTIC RECEDING HORIZON CONTROL WITH BOUNDED CONTROL INPUTS 



17 



where Bi := [A^-^B ■■■ AB S], Df := [A^-^ ■■■ A I„x„], and WkN,.,kN,+c-i ■= 

Kn, ■■■ <N,+f-i]^- 

Suppose that the above Nc-horizon optimal policy is computed as in Corol- 
lary 4.1. We define the receding horizon policy corresponding to the consecutive 
concatenation of this Nc-horizon optimal policy as 

(5.2) 71* := (tTq.^_j(Xo), ^N^.:2N,_l(^Wt)' ^2N,:3N,-l(^2N,)/---)- 

Proposition 5.2. Consider the system (2.1), and suppose that Assumptions 3.1 
and 5.1 hold. Forp = oo and any control horizon 1 < Nc < N the receding horizon 
control policy n* renders the closed loop system (5.1) mean-square hounded, i.e., 
^^PfeNo Ei;o[ll^fll^] < °° ior every initial condition Xq e R". 

The proof of this Proposition is postponed to §A.2 in the Appendix. 

§ 5.2. Input-to-state stability. Input-to-state stability (iss) is an interesting and 
important qualitative property of input-output behavior of dynamical systems. 
In the deterministic discrete-time setting [JWOl], iss generalizes the well-known 
bounded-input bounded-output (BIBO) property of linear systems [AM06, p. 490] 
to the setting of nonlinear systems, iss provides a description of the behavior of a 
system subjected to bounded inputs, and as such it may be viewed as an Xcw to 
-Coo gain of a given nonlinear system. In this section we are interested in a useful 
stochastic variant of input-to-state stability; see e.g., [BorOO, ST03] for other possible 
definitions and ideas (primarily in continuous-time). 

Definition 5.3. The system (2.1) is input-to-state stable in Xi if there exist functions 
/3 e and a, yi, 72 g "^00 such that for every initial condition xq e R" we have 

(5.3) IE,„[a(||xt||)] < )S(||xo|| , t) + yi(sup ||w,|L) + y2(l|2:ir) V i e No, 

S€No 

where \\-\\' is an appropriate matrix norm. O 

One evident difference of iss in Xi with the deterministic definition of iss is the 
presence of the function a inside the expectation in (5.3). It turns out that often it 
is more natural to arrive at an estimate of ]EvJa(||Xf||)] for some a e 7C<, than an 
estimate of Ej;J||Xf||]. Moreover, in case a is convex, Jensen's inequality [Dud02, 
p. 348] implies that such an estimate is stronger than an estimate of ExJUxdl]. 

The property expressed by (5.3) is one possible iss-type property for stochastic 
systems. One can come up with alternative stochastic analogs of the iss property, 
such as the following: V e e ]0, 1[ 3 /S e and 3 yi, y2 e "TCo such that P(||Xf|| < 

iS(ll^oll/0 + 7(sup^gj^^ llMslD + yzdPII') e No) > 1-e. Intuitively this means that for 
1 - £ proportion of the sample paths the deterministic iss property holds uniformly. 
However, in an additive i.i.d unbounded noise setting as in (2.1), this property fails 
to hold because almost surely the states undergo excursions outside any bounded 
set infinitely often; in this case the weaker version: Ve e ]0, 1[ 3jS e and 
3 yi, 72 e 'K^ such that p(||xf|| ^ jS(||xo|| , f) + y(sup^_,j^^ ||m,||) + y2(l|2:||')) ^ 1 - e V f e 
No is comparatively better suited. We shall however stick with the iss in Xi property 
in this article. 

The following Proposition can be established with the aid of Proposition 5.2 for 
p = 00; the proofs for p = 1 and 2 are also similar in spirit. 
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Proposition 5.4. Consider the system (2.1), and suppose that Assumptions 3.1 and 
5.1 hold. Then the closed-loop system (5.1) is iss in Xi under the policy n* in (5.2) 
for any 1 ^ N,. < N. 

§6. Numerical Examples 

In this section we present several numerical examples to illustrate the theoret- 
ical results in the preceding sections. We start in Example 6.1 by comparing the 
performance of our policy (3.3) to that of the standard finite horizon LQG con- 
troller whenever the control inputs set U = W, i.e., there are no boimds on the 
norm of the inputs. Then we compare the performance of our policy (3.3) against 
a saturated LQG controller in Example 6.2. Finally in Example 6.3 we illustrate 
the effectiveness of our policy (3.3) compared to the certainty-equivalent receding 
horizon control. 

Example 6.1 (Unconstrained Inputs). A natural question that may arise whenever 
the control inputs in our setup are not constrained, i.e., U = R'", is the following: 
How does the policy (3.3) compare to the globally optimal controller, which in this 
case is the standard finite-horizon LQG controller? One would expect our policy 
to perform worse on the average since we restrict to a class of feedback policies 
that may not contain the globally optimal controller. 

We compared our policy against that of the LQG problem in simulation for two 
controllable 3-dimensional single-input linear systems. In each case we solved an 
unconstrained finite-horizon LQ optimal control problem corresponding to state 
and control weights Qt = 3 13x3 and Rt = 1 for every t. We selected an optimization 
horizon N - 50, and simulated the system responses starting from 10^ different 
initial conditions xq selected at random uniformly from the cube [-100, 100]^, and 
noise sequences zvt corresponding to i.i.d Gaussian noise of mean and (randomly 
chosen) variance 

2.830399255 5.491512606 3.612257417" 
E„, = 5.491512606 11.554870229 6.896706270 . 
3.612257417 6.896706270 4.625993264 

We selected the nonlinear bounded term t{iu) in our policy u = t] + @s{w) to be 
a vector of scalar sigmoidal functions (p{Q :- 0.24/ -^1 + O.OAl^^ applied to each 
coordinate of the vector w. The covariance matrices Ec and Ec' that are required 
to solve the optimization problem (3.3) were computed empirically via classical 
Monte Carlo methods [RC04, Section 3.2] using 10^ i.i.d samples. 
The first system is described by: 



The system pair (A, B) is in Brunovsky canonical form, and A has eigenvalues at 
0.8642, and -0.5571 ± i0.3905. The test results showed that the mean of the ratio of 
the cost corresponding to LQG to the cost corresponding to our policy is 0.99916, 
and the standard deviation of this ratio is 0.003619. 
The second system is described by: 







' 


1 







0' 


(6.1) 










1 


Xk + 









0.4 


0.5 


-0.25 




1 
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^k+l - 
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Xk + 
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This particular system matrix A is in Jordan canonical form and has three eigen- 
values at 1. The test results showed that the mean of the ratio of the cost of LQG 
against the cost of our policy is 0.99673 and the corresponding standard deviation 
is 0.008045. 

Computations for determining our policy in the above two cases were carried 
out in the MATLAB-based software package cvx. In the case of the system (6.2) 
the solver utilized by cvx reported numerical problems in five different runs, for 
which it gave values of the aforementioned ratio below 0.96. Note that we have 
not discarded these five cases from the mean and variance figures reported above. 

The close-to-optimal performance of our policy is surprising in view of the fact 
that the vector-space 1-1 is the linear span of one bounded function, and does 
not contain the theoretically optimal linear (in the current state) controller. We 
conjecture that this is due to injectivity of the mapping e, due to which e(tf f) retains 
all information generated by ifj. Of course, in the absence of control constraints 
our solution is much more computationally demanding than the LQG controller, 
and would not be used in practice in this case. A 

Example 6.2 (Saturated LQG and Receding Horizon). We compare the performance 
of saturated LQG against our policy (3.3) for the system (6.2) in this example. We 
fixed the optimization horizon N - 2, the control horizon Nc ~ 1, and the weight 
matrices for the states and the control to be Qt - Isxs and Rt = 0.01 for all t, 
respectively. The control bounds in both cases was [-2,2], the nonlinear bounded 
term t{wt) in our policy u = rj + &t{w) was a vector of scalar standard saturation 
functions applied to each coordinate of the vector Wt, and the LQG control input 
was saturated at ±2. The covariance matrices Zc and Ee' required to solve the 
optimization problem (3.5) were computed empirically via classical Monte Carlo 
integration methods [RC04, Section 3.2] using 10^ i.i.d samples. 

We simulated the system (6.2) starting from the same initial condition Xq = 

[O o] for 100 different independent realizations of the noise sequence Wf over 
a horizon of 200. The behavior of the average (over the 100 realizations) cost 
corresponding to the two scenarios is shown in Figure 1. The simulations were 
coded in MATLAB and the optimization programs were coded in the software 
package cvx. The average total cost incurred at the end of the simulation horizon 
when using the saturated LQG scheme above was 1.790 x 10^^ units, whereas the 
average total cost incurred at the end of the simulation horizon (f - 200) using our 
policy (3.3) in a receding horizon fashion was 4.486 X 10^ units. A 

Example 6.3 (Constrained Inputs). Consider the 2-dimensional linear stochastic 
system: 

(6.3) 





1.23 


-0.15 




0.14 






0.25 


1 


Xf + 


0.12 


Mf + Wt, 



where {zvt)t€Kg is a sequence of i.i.d Gaussian noise with zero mean and (ran- 

, , . [2.722030613 4.9759996931 , , 

domly generated) variance 4 975999^93 9.io2559685 ' v^eight matrices 

corresponding to the states and control be Qt = I2x2 and Rt - 0.8 for each t. The 
covariance matrices Ec and Ee' that are required to solve the optimization problem 
(3.3) were computed empirically via classical Monte Carlo integration methods 
[RC04, Section 3.2] using 10^ samples. 
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We fixed the optimization horizon N - 7, the nonlinear saturation s{wt) to be 
a vector of scalar sigmoidal functions (p{Q :- 0.24/ -^1 + O.OAl^^ applied to each 
coordinate of the vector Wt, and compared the certainty-equivalent MFC strategy 
{Nc = 1, @ = 0, Wt = 0) against our receding horizon strategy (3.3) with control 
horizon Nc = 4. The control constraints in both cases were Ut e [-200, 200]. We sim- 
ulated the system in both cases starting from the same initial condition Xq = [o oj , 
for 60 different realizations of the noise sequence zvt; plots of states, average cost, 
and standard deviation are shown in Figures 2 and 3. The average cost incurred 
when using the certainty-equivalent MFC scheme was 7.893 x 10^ units, whereas 
the average cost incurred when using our policy (3.3) in a receding horizon fash- 
ion was 3.141 X 10^ units. Therefore, applying our policy in a receding horizon 
fashion one saves 60.2% of the cost corresponding to the certainty -equivalent MFC 
controller on the average. This example illustrates that there may be cases where 
open-loop certainty-equivalent MFC, in the absence of state-constraints, is outper- 
formed by a large margin by a judiciously selected receding-horizon strategy. The 
simulations were coded in YALMIP and were solved using SDPT-3; the solver-time 
statistics (in sec.) for the certainty-equivalent MFC and receding horizon schemes 
were as follows: 





certainty-equivalent MFC 


receding horizon 


Mean 


32.127 


59.615 


Standard deviation 


4.610 


21.675 


Maximum 


50.590 


90.036 


Minimum 


20.240 


20.466 
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These statistics correspond to the above simulations carried out on an x86_64 octa- 
core machine with 24GB RAM, each processor of which was an Intel® Xeon® CPU 
E5540 2.53GHz with cache size 8192 KB, running GNU/Lrnux. 
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Figure 2. Plots of states corresponding to: certainty-equivalent 
MPC with Nc = 1 (left) and our receding horizon control scheme 
with Nc = 4 (right) in Example 6.3. 
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(a) Plot of average costs 



(b) Plot of standard deviations 



Figure 3. Plots of average cost (left) and standard deviations 
(right) corresponding to: certainty-equivalent MPC with Nc ~ 1 
and our receding horizon control scheme with Nc = 4 in Example 
6.3. 



We also applied the first four control values of the certainty-equivalent scheme 
and compared it against our receding horizon scheme using policy (3.3), i.e., Nc - 4: 
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for both controllers. We simulated the system in both cases starting from the same 

initial condition xq = ^0 oj , for 60 different realizations of the noise sequence zvt; 
plots of the states, average cost, and standard deviation are shown in Figures 4 
and 5. The average cost incurred when using the certainty-equivalent with control 
horizon Afc = 4 was 4.21 1 X 10^ units, whereas the average cost incurred when using 
our policy (3.3) in a receding horizon fashion was 3.295 x 10^ units. We see that 
by applying our policy in a receding horizon fashion one saves 21.7% of the cost 
corresponding to the certainty equivalence controller on the average. The simula- 
tions were coded in YALMIP and were solved using SDPT- 3; the solver-time statistics 
(in sec.) for the certainty-equivalent and receding horizon schemes were as follows: 





certainty-equivalent 


receding horizon 


Mean 


7.537 


67.494 


Standard deviation 


0.812 


11.845 


Maximum 


9.776 


85.232 


Minimum 


6.101 


43.601 



These statistics correspond to the above simulations carried out on an x86_64 octa- 
core machine with 24GB RAM, each processor of which was an Intel® Xeon® CPU 
E5540 2.53GHz with cache size 8192 KB, running GNU/Lrnux. A 
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Figure 4. Plots of states corresponding to: certainty-equivalent 
with Nc ~ 4l (left) and our receding horizon control scheme with 
Nc = 4 (right) in Example 6.3. 



§7. Conclusion and Future Directions 

We provided tractable solutions to a variety of finite-horizon stochastic optimal 
control problems with quadratic cost, hard control constraints, and unbounded 
additive noise. These problems arise as parts of solutions to the stochastic receding 



STOCHASTIC RECEDING HORIZON CONTROL 



WITH BOUNDED CONTROL INPUTS 



23 



4.5 



35 



4 



certainty-equivalent 

receding horizon 



certainty-equivalent 

receding trorizon . ^ 




3.5 




50 



too 

time 



150 



200 







50 



too 

time 



150 



200 
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(b) Plot of standard deviations 



Figure 5. Plots of average cost (left) and standard deviations 
(right) corresponding to: certainty-equivalent with Nc = 4 and 
our receding horizon control scheme with = 4 in Example 6.3. 



horizon problems (2.4). The control policy obtained as a result of the finite-horizon 
optimal control sub-problems may be nonlinear with respect to the previous states, 
and the policy elements are chosen from a vector space that is largely up to the 
designer. One of the key features of our approach is that the variance-like matrices 
employed in the finite-horizon optimal control sub-problems may be computed off- 
line, and we illustrated this feature with several examples. We demonstrated that 
applying our obtained policies in a receding horizon fashion results in bounded 
state variance. Finally, we provided several numerical examples that illustrate 
the effectiveness of our method with respect to the commonly used certainty- 
equivalent MFC controllers. 

The development in this article affords extensions in several directions. One is 
the incorporation of state constraints. As discussed in §1, hard state constraints 
do not make sense in the stochastic with additive unbounded noise setting unless 
one is prepared to artificially relax them once infeasibility is encountered. Froba- 
bilistic constraints and integrated chance constraints [Han83] constitute popular 
alternative methods to impose constraints on the state that are more probabilistic 
in nature. It will be interesting to see how the approach introduced in this arti- 
cle reacts to state-constraints. A second direction is to consider specific kinds of 
nonlinear models, particularly those which involve multiplicative noise, in our 
framework, and a third is to consider different objective functions such as affine 
fimctions given by the {oo and the norms. 



We are indebted to Soumik Fal for pointing out the possibility of representing 
policies as elements of a vector space. We thank Colin Jones for some useful discus- 
sions on convexity of some of the optimization programs, and the three anonymous 
reviewers for their valuable suggestions that have led to substantial improvements 
of the original manuscript. 
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Appendix 

§A.l. Some identities. Recall the following standard special mathematical func- 
tions: the standard error function erf(z) := e'^dt and the complementary error 
function [AS64, p. 297] defined by erfc(z) := 1 -erf(z) for z e ]R, the incomplete Gamma 
function [AS64, p. 260] defined by T{a,z) := f'^e'^dt for z,a > 0, the confluent 
hypergeometric function [AS64, p. 505] defined by U{a,b,z) :- e~^'t"~^{l + 
^•^h-a-i^f. Q J, 2 > 0, and r is the standard Gamma function. All of these are 
implemented as standard functions in Mathematica. The following facts can be 
found in [AS64] and are collected here for completeness. 

Facts about Special Functions. For > Owe have 

^ f e-^df = ^(l+erf(^)) 
2noJz 2V ^^j2o" 

1 P f2 _A , 1/ _i , / 1 NX 

— - re 2.,2 df = - V27T cr - Tie 2„2 erfc — ^ 

7 -J- 

cr e 2„2; 



-L^ ft-e-^dt^J^a^e4^)- 



1 r 



te ^"^dt - — — Gamma(2cJ^,l) 



In a Ji yln 

2 



V2^aJo VTTF 2V2^ ^2' ' 2a^' 



§A.2. Proof of mean-square boundedness. 

Proof of Proposition 5.2. Fix Xq e R". For any n xn matrix P = P^ > 0, using (5.1) 
and the fact that E [t{w)] = 0, we see that for every { = 1, - ■ ■ ,Nc 

+ ^^KN, [W^C^kN.-.kN.+e-li^kN,) + DcWkN,:kN,+{-l\\p\, 

where II^IIp := V^^^. Using the fact that 7i*j^,.tiv,+f_i(^fcN,) ^ < /imax by construc- 
tion, we obtain the following bound: 

(A.l) E,,,^ [xl^^^fPxkN^^^e] < ^nS^'^P^' 

XkN, + ^Cif ||^/cN,||^ + C2e, 

where 

cic := m \\{A^)'^PBc\\^ !J„ax, 

C2e ■- m \\B]PBe\\^ (J^^ + tr(DjPDeL,,) 
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h max \tr(r{xtNfBjPBcr{xkN^Ai) + 2tr(T{xkNyB]PDeA2)l 



and T(xj;N,) ■= 



Since A is a Schur stable matrix (and hence so is A^) there exists [Ber09, Proposition 
11.10.5] a matrix Pf = Pj > with real-valued entries that satisfies {A'^)^PeA^ -Pf = 
-I„x«; in particular, we have xj^^ (A^^PfA^^icN, < xJ^ PcXkN, - xlj^XkN,- Therefore, 
with P = in (A.l) we arrive at 

(A.2) E.vj^^_ [xl^_^fPcXkN,+f\ < xl;^P(XkN, - \\xkN,f + 2Ci[ \\xkN,\\^ + Cll- 

For Cf e ]max{0,l - Amax{Pd]A[ let rt := ^{cn + yjc^^ + C2eQe)- Then elementary 
properties of the quadratic function g{y) :- -Qey^ + "^-Cuy + C2t show that 

-Q ll^fcNjll + 2cif II^JckII^ + C2f < whenever \\XkN,\^ > rt, 
In view of the above fact, simple manipulations in (A.2) lead to 

^■ttNc[^Hv,+/-P^^*N,+f] < xJj^PeXkN, - (1 - Q) \\xkN,f whenever \\xkN,\\^ > re, 

from which, letting pf := (l — a "^"(Pf) )^ arrive at 

(A.3) E.,.,^,[x[^ .^^P^XfcN,+f] ^ pexJj^PeXkN, whenever ||xjcn,|L > 
Let us define 

p := max p(, r' := max Vf, 

A := max \ra&x{Pt), ^ '■= min Amin(f'f)- 

^=1 - e=i N, 

Then we can obtain using (A.3) the conservative bound for every { = !,.. . ,Nc: 

^x,Nc[^lj^,+(^Nc^kN,+e] < p'xJ^PNXkN, whenever \\xkN,\\^ > r' , 
where p' := Pii^ff^- It follows immediately that 

(A.4) E.^,^^, [xl^^^fPN.XkN.+f] < p'xJf^fN.XkN, + b'lK'iXkN,), 

where K' := e K"\mL ^ r']. 

Let us define the function V{^) := £,'^Pn,^, and fix e N and £ = l,...,Nc. Let 
Kn^ := {<5 e ]R"|||£|L < r^], b := supE,.[y(xN,)], and b' := max supE,,[y(xf)]. 

X€K (-\,...,Nc j.£j<;/ 

From (A.4) we get 

Exo[V(XfcN^+^)] = E.,.„[E[y(X;tN,+f) |^)cN,]] ^ ^x\p'y{XkN,) + b'lK'{XkNS[ 

< E.^.„[p'E[y(XtN,) |^()c-l)N,] + b'lK'{XkNS\ 

^ E.,„[p'piv^,y(X(t_i)N,) + blK^^{X(k-l)N) + b'lK'{XkNS\ 

Jt-l 

< p'p\,y(x) + bp],-'-'E,,[lK,^{x,N,)] + b'E,,[lK>{x,N^)] 

!=0 



28 



D. CHATTERJEE, P. HOKAYEM, AND J. LYGEROS 



(A.5) < p'plVix) + \ + b'. 

Note that the conditioning in the first few steps of (A.5) is well-defined because 
it is performed every Nc steps starting from 0, and the structure of our policy n* 
makes the process (Xfjv JfeNo Markovian. Therefore, it follows from (A.5) that for all 
t := kNc + €, 

SUpE:fJ||x,||^] ^ ——SUp^x\y{XkN,+{)\ 

AminU nJ \ '-^ PNc / 

< oo, 

where the last step follows from the fact that p^, < 1 ■ This completes the proof. □ 
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